30 research outputs found

    From Self-interest To Commons: Distinct Aspect Of social Bookmarking Services

    Get PDF
    With web contents being generated and shared at an ever-increasing pace, a number of approaches to effectively control and retrieve contents have been developed. Social tagging is a widely implemented method for classifying contents resulting from the dispersed activities of users. Social bookmarking services (SBM) is a web service with the purpose of making information generally available on a shared basis. Accumulation of tags on SBM occurs mainly without inviting the collaboration of others, but on the basis of activities satisfying individual self-interest. SBM is in fact the optimal web platform utilizing the sum of such activities for the formation of commons

    Crowdsourcing chart digitizer : task design and quality control for making legacy open data machine-readable

    Get PDF
    Despite recent open data initiatives in many countries, a significant percentage of the data provided is in non-machine-readable formats like image format rather than in a machine-readable electronic format, thereby restricting their usability. Various types of software for digitizing data chart images have been developed. However, such software is designed for manual use and thus requires human intervention, making it unsuitable for automatically extracting data from a large number of chart images. This paper describes the first unified framework for converting legacy open data in chart images into a machine-readable and reusable format by using crowdsourcing. Crowd workers are asked not only to extract data from an image of a chart but also to reproduce the chart objects in a spreadsheet. The properties of the reproduced chart objects give their data structures, including series names and values, which are useful for automatic processing of data by computer. Since results produced by crowdsourcing inherently contain errors, a quality control mechanism was developed that improves accuracy by aggregating tables created by different workers for the same chart image and by utilizing the data structures obtained from the reproduced chart objects. Experimental results demonstrated that the proposed framework and mechanism are effective. The proposed framework is not intended to compete with chart digitizing software, and workers can use it if they feel it is useful for extracting data from charts. Experiments in which workers were encouraged to use such software showed that even if workers used it, the extracted data still contained errors. This indicates that quality control is necessary even if workers use software to extract data from chart images

    データ引用による言語資源活用文献の把握の可能性 : BCCWJの分析から

    Get PDF
    National Institute of InformaticsNational Institute of InformaticsNational Institute of Informatics会議名: 言語資源活用ワークショップ2019, 開催地: 国立国語研究所, 会期: 2019年9月2日−4日, 主催: 国立国語研究所 コーパス開発センター言語資源データの引用情報調査に基づいて、そのデータを活用した研究文献の発見可能性について論じる。このために言語処理学会年次大会発表論文集を対象として「現代日本語書き言葉均衡コーパス」などの引用情報を調査した。本稿ではその結果と今後の課題について報告する

    Egocentric Search Method for Authoring Support

    No full text
    In this paper we propose egocentric search methods based on the concept of ”Information and Communicate Activities Navigation (ICAN) ” and an authoring support system for Weblog (blog). ICAN regulates the human activities from a viewpoint of information and communication support. We introduce the idea of ”Collect ” and ”Relate ” in the ICAN table into the information retrieval and the search method which uses contents and human relationship produced by daily blogging. Our egocentric methods provide more subjective search result than the conventional engines. We apply the methods to improve the quality of the small contents made with Weblog tools

    CiNii: Bringing Linked Data to Japan's Largest Scholarly Search Engine

    No full text
    In this poster, we present our effort to create a large quantity of Linked Data from Japan’s largest scholarly search engine. We design OpenSearch RSS and Bibliography RDF using standard vocabularies. As a result over 20 million articles with Linked Data are available on the Web

    From one star to three stars : Upgrading legacy open data using crowdsourcing

    Get PDF
    Despite recent open data initiatives in many coun- tries, a significant percentage of the data provided is in non- machine-readable formats like image format rather than in a machine-readable electronic format, thereby restricting their usability. This paper describes the first unified framework for converting legacy open data in image format into a machine- readable and reusable format by using crowdsourcing. Crowd workers are asked not only to extract data from an image of a chart but also to reproduce the chart objects in spreadsheets. The properties of the reconstructed chart objects give their data structures including series names and values, which are useful for automatic processing of data by computer. Since results produced by crowdsourcing inherently contain errors, a quality control mechanism was developed that improves the accuracy of extracted tables by aggregating tables created by different workers for the same chart image and by utilizing the data structures obtained from the reproduced chart objects. Experimental results demonstrated that the proposed framework and mechanism are effective
    corecore